feat: add OpenAI diarization support by 8times4 · Pull Request #651 · TanStack/ai

8times4 · 2026-05-27T15:18:33Z

🎯 Changes

This change adds diarization support for OpenAI's gpt-4o-transcribe-diarize model, based on https://developers.openai.com/api/docs/guides/speech-to-text?lang=javascript

✅ Checklist

I have followed the steps in the Contributing guide.
I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

This change affects published code, and I have generated a changeset.
This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

New Features
- Added speaker diarization for OpenAI transcriptions with automatic speaker labeling and segment-level timestamps
- New diarized JSON response format for structured, speaker-labeled transcripts
- Transcription API now supports GPT-4o diarize model alongside existing models
Documentation
- Updated transcription docs, examples, and best practices with diarization usage, options, and constraints

coderabbitai · 2026-05-27T15:26:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds OpenAI speaker diarization support for transcription, enabling speaker-labeled segment output via the gpt-4o-transcribe-diarize model. It introduces type contracts for diarized_json format, extends provider options with diarization parameters, implements model detection and validation in the adapter, adds comprehensive tests, wires diarization into E2E infrastructure and examples, and documents the feature across guides and API references.

Changes

OpenAI Transcription Diarization Feature

Layer / File(s)	Summary
TranscriptionResponseFormat type and type imports `packages/ai/src/types.ts`, `packages/ai-client/src/generation-types.ts`, `packages/ai/src/activities/generateTranscription/index.ts`	Introduces `TranscriptionResponseFormat` type alias enumerating supported formats and updates `TranscriptionOptions`, `TranscriptionGenerateInput`, and `TranscriptionActivityOptions` to reference it, enabling `diarized_json` format across the SDK.
OpenAI provider option types `packages/ai-openai/src/audio/transcription-provider-options.ts`	Introduces `OpenAITranscriptionResponseFormat` union type and extends `OpenAITranscriptionProviderOptions` with `response_format`, `prompt`, and `chunking_strategy` fields for diarization configuration.
OpenAI adapter diarization implementation `packages/ai-openai/src/adapters/transcription.ts`	Detects diarization-capable models, validates diarization constraints (no prompt, no include, no timestamp_granularities, speaker count limits), auto-configures `chunking_strategy: auto` for diarization, maps `diarized_json` request/response, parses diarized segments with speaker labels into `TranscriptionSegment[]`, and preserves non-diarized transcription backward compatibility.
Validation and format mapping `packages/ai-openai/src/adapters/transcription.ts`	Adds `validateDiarizationOptions` and extends response-format mapping to include `diarized_json`, enforcing diarization-only constraints and validating speaker metadata.
Diarization adapter tests `packages/ai-openai/tests/transcription-adapter.test.ts`	Comprehensive test suite verifying diarization defaults (`response_format: diarized_json`, `chunking_strategy: auto`), explicit option forwarding, null chunking_strategy handling, segment ID normalization, alternate response format acceptance, and validation errors for unsupported options, speaker metadata constraints, and model/feature mismatches.
E2E test harness and feature routing `testing/e2e/src/lib/types.ts`, `testing/e2e/src/lib/feature-support.ts`, `testing/e2e/src/lib/features.ts`, `testing/e2e/src/lib/media-providers.ts`, `testing/e2e/src/lib/server-functions.ts`, `testing/e2e/src/routes/$provider/$feature.tsx`	Extends E2E test infrastructure with `transcription-diarization` feature type, marks OpenAI as the only supporting provider, configures media feature routing, adds `feature` parameter to adapter creation to select model variants, and extends server function schemas to accept `responseFormat` and `modelOptions`.
E2E UI and API route updates `testing/e2e/src/components/TranscriptionUI.tsx`, `testing/e2e/src/routes/api.transcription.ts`, `testing/e2e/src/routes/api.transcription.stream.ts`	Updates `TranscriptionUI` to accept `feature` prop, conditionally build diarization `modelOptions`, render speaker labels in segments, and updates API routes to parse and forward `responseFormat`, `modelOptions`, and `feature` through transcription generation with feature-aware adapter creation.
E2E fixtures and diarization test suite `testing/e2e/fixtures/transcription/`, `testing/e2e/tests/transcription.spec.ts`	Adds diarized transcription response fixture with speaker-labeled segments and end-to-end test assertions for segment text, speaker labels, and delivery modes (SSE, HTTP stream, fetcher).
Example app transcription provider types `examples/ts-react-chat/src/lib/audio-providers.ts`	Introduces `openai-diarize` provider ID and extends `TranscriptionProviderConfig` with optional `transcriptionOptions` for provider-specific diarization settings.
Example app server functions and routing `examples/ts-react-chat/src/lib/server-audio-adapters.ts`, `examples/ts-react-chat/src/lib/server-fns.ts`, `examples/ts-react-chat/src/routes/api.transcribe.ts`, `examples/ts-react-chat/src/routes/generations.transcription.tsx`	Extends server functions to accept `responseFormat` and `modelOptions`, adds `openai-diarize` provider routing to `gpt-4o-transcribe-diarize` model, and wires diarization parameters through transcription generation and UI in example application.
Knip config update `knip.json`	Removes `packages/ai-openai/src/audio/transcribe-provider-options.ts` from Knip ignore list to enable unused export detection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

tombeckenham
AlemTuzlak
jherr

"🐰 I hopped through code with ears held high,
I labeled speakers as they spoke nearby,
Chunking set to auto, segments all align,
Diarized JSON makes each voice shine! 🎧✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add OpenAI diarization support' accurately and concisely describes the main feature addition—diarization support for OpenAI's transcription models.
Description check	✅ Passed	The PR description follows the template structure, includes a clear summary of changes linking to OpenAI documentation, and has all checklist items properly addressed with required changeset confirmation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-openai/src/adapters/transcription.ts`:
- Around line 267-285: The diarization validation is missing a local guard for
responseFormat: update validateDiarizationOptions (used by transcribe and
guarded by isDiarizeTranscriptionModel) to throw when
modelOptions.responseFormat (or the mapped value from mapResponseFormat) is not
one of the allowed values ["json","text","diarized_json"]; ensure transcribe()
cannot send srt/vtt/verbose_json for diarize models by checking
modelOptions.responseFormat (or resolved response format) early and throwing a
clear error stating diarization models only support json, text, and
diarized_json; reference validateDiarizationOptions, transcribe,
mapResponseFormat, and isDiarizeTranscriptionModel when applying the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c4b4b31-fb90-4e00-9d8f-1454f513e089

📥 Commits

Reviewing files that changed from the base of the PR and between 5634f18 and a59d368.

📒 Files selected for processing (13)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md
packages/ai-client/src/generation-types.ts
packages/ai-openai/src/adapters/transcription.ts
packages/ai-openai/src/audio/transcription-provider-options.ts
packages/ai-openai/tests/transcription-adapter.test.ts
packages/ai/skills/ai-core/media-generation/SKILL.md
packages/ai/src/activities/generateTranscription/index.ts
packages/ai/src/types.ts

coderabbitai · 2026-05-28T11:11:06Z

Actionable comments posted: 0

nx-cloud · 2026-06-04T03:52:48Z

View your CI Pipeline Execution ↗ for commit 58aa20c

Command	Status	Duration	Result
`nx run-many --targets=build --exclude=examples/...`	✅ Succeeded	55s	View ↗

☁️ Nx Cloud last updated this comment at 2026-06-16 21:54:33 UTC

pkg-pr-new · 2026-06-04T03:53:28Z

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@651

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@651

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@651

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@651

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@651

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@651

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@651

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@651

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@651

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@651

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@651

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@651

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@651

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@651

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@651

@tanstack/ai-mcp

npm i https://pkg.pr.new/@tanstack/ai-mcp@651

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@651

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@651

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@651

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@651

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@651

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@651

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@651

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@651

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@651

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@651

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@651

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@651

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@651

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@651

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@651

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@651

commit: 58aa20c

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md

✅ Files skipped from review due to trivial changes (5)

.changeset/openai-transcription-diarization.md
docs/media/generation-hooks.md
docs/comparison/vercel-ai-sdk.md
docs/adapters/openai.md
docs/reference/interfaces/TranscriptionOptions.md

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md

✅ Files skipped from review due to trivial changes (5)

.changeset/openai-transcription-diarization.md
docs/media/generation-hooks.md
docs/comparison/vercel-ai-sdk.md
docs/adapters/openai.md
docs/reference/interfaces/TranscriptionOptions.md

🛑 Comments failed to post (1)

docs/media/transcription.md (1)
561-561: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the provider’s latest OpenAI transcription model in this example.

This changed snippet still hardcodes whisper-1; please update it to the latest OpenAI transcription model defined in the adapter model-meta.ts to keep docs aligned with project policy.

As per coding guidelines: “Use the latest model per provider in documentation example code, sourced from each adapter's model-meta.ts (newest gpt-*, claude-*, gemini-*, …)”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/media/transcription.md` at line 561, The example hardcodes 'whisper-1'
in the createOpenaiTranscription call; update the docs to use the provider's
latest transcription model constant exported from the OpenAI adapter's
model-meta.ts instead of a string literal. Import or reference the exported
latest-model symbol from that file (e.g., the adapter's LATEST_* or DEFAULT_*
transcription model constant) and pass that symbol into
createOpenaiTranscription so the docs always use the adapter-defined current
OpenAI transcription model.

tombeckenham · 2026-06-04T04:03:58Z

Hi @8times4, thank you for this. Would you be able to create an e2e test for this using aimock? The tests are in the e2e test package. Ideally, adding a way to see the results on one of the ts-react-chat example pages would be great as well

tombeckenham · 2026-06-04T04:19:35Z

Code review

Found 3 issues:

No E2E test coverage added for the diarization feature/behavior change (new gpt-4o-transcribe-diarize model, diarized_json responseFormat, speaker-labeled TranscriptionSegments, chunking_strategy + known_speaker_* options + validation). (CLAUDE.md says "Every feature, bug fix, or behavior change MUST include E2E test coverage." and "Add or update E2E tests — this is mandatory for any feature, bug fix, or behavior change"; see also the new-feature row in the E2E table and Pre-PR Quality Gate requiring pnpm --filter @tanstack/ai-e2e test:e2e. AGENTS.md and prior transcription PRs feat: extract @tanstack/openai-base and @tanstack/ai-utils packages #409/feat(ai-grok): audio, speech, and realtime adapters + example wiring #506 reviews establish the same convention: update feature-support.ts + test-matrix + fixture + spec.)

ai/packages/ai-openai/src/adapters/transcription.ts

Lines 140 to 150 in 05dfb53

    
               id: generateId(this.name), 
        
               model, 
        
               text: response.text, 
        
               duration: response.duration, 
        
               ...(segments.length > 0 && { segments }), 
        
             } 
        
           } 
        
           if (useVerbose) { 
        
             const response = (await this.client.audio.transcriptions.create({ 
        
               ...request,

responseFormat union literal duplicated (with added | 'diarized_json') across three locations instead of extracting a shared type. (CLAUDE.md says "Always look for repeated code or if the function you are trying to implement is already in another file" and "Review code at the end to see if you can make it more concise and less repetitive".)

ai/packages/ai/src/types.ts

Lines 1723 to 1732 in 05dfb53

    
             confidence?: number 
        
             /** Speaker identifier, if diarization is enabled */ 
        
             speaker?: string 
        
           } 
        
           /** 
        
            * A single word with timing information. 
        
            */ 
        
           export interface TranscriptionWord { 
        
             /** The transcribed word */

Validation guards in the newly added validateDiarizationOptions (and caller guard) are inconsistent with modelOptions conventions and incomplete: camelCase cast for responseFormat inside modelOptions (while spread + all other fields use snake_case response_format/chunking_strategy/known_speaker_*); prompt rejection and diarization-options guard only inspect top-level (not modelOptions paths); chunking_strategy diarize-only restriction does not check modelOptions?.chunking_strategy. This allows bypasses leading to late 400s instead of early errors. (CLAUDE.md says "Don't create fallback code. It hides problems. Just display errors to the user".)

ai/packages/ai-openai/src/adapters/transcription.ts

Lines 339 to 370 in 05dfb53

    
                 ) 
        
               } 
        
             } 
        
             protected mapResponseFormat( 
        
               format?: OpenAITranscriptionResponseFormat, 
        
             ): OpenAITranscriptionResponseFormat { 
        
               if (!format) return 'json' 
        
               return format 
        
             } 
        
           } 
        
           /** 
        
            * Creates an OpenAI transcription adapter with explicit API key. 
        
            * Type resolution happens here at the call site. 
        
            * 
        
            * @param model - The model name (e.g., 'whisper-1') 
        
            * @param apiKey - Your OpenAI API key 
        
            * @param config - Optional additional configuration 
        
            * @returns Configured OpenAI transcription adapter instance with resolved types 
        
            * 
        
            * @example 
        
            * ```typescript 
        
            * const adapter = createOpenaiTranscription('whisper-1', "sk-..."); 
        
            * 
        
            * const result = await generateTranscription({ 
        
            *   adapter, 
        
            *   audio: audioFile, 
        
            *   language: 'en' 
        
            * }); 
        
            * ``` 
        
            */

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

- Removed `transcribe-provider-options.ts` file and integrated its options into `transcription-provider-options.ts`. - Updated documentation to reflect changes in response formats, emphasizing the use of `modelOptions.response_format` for diarization. - Enhanced the transcription adapter to handle new model options and response formats, including support for speaker diarization. - Adjusted various components and tests to accommodate the new structure and ensure compatibility with the updated transcription features.

8times4 · 2026-06-12T16:38:35Z

Thanks for the review @tombeckenham, should be fixed now.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

packages/ai/skills/ai-core/media-generation/SKILL.md (1)

284-286: ⚡ Quick win

Clarify the diarization contract.

chunking_strategy: 'auto' reads like an optional default here, but the adapter enforces that setting for gpt-4o-transcribe-diarize. Please phrase it as required behavior, not a caller-tunable default.

♻️ Suggested wording

-For speaker diarization, use openaiTranscription('gpt-4o-transcribe-diarize').
-It defaults to modelOptions.response_format: 'diarized_json' and chunking_strategy: 'auto';
+For speaker diarization, use openaiTranscription('gpt-4o-transcribe-diarize').
+The adapter enforces modelOptions.response_format: 'diarized_json' and chunking_strategy: 'auto';
 do not pass prompt, include, or timestamp_granularities with this model.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/skills/ai-core/media-generation/SKILL.md` around lines 284 - 286,
The documentation wording implies chunking_strategy: 'auto' is an optional
default, but the adapter enforces that value for
openaiTranscription('gpt-4o-transcribe-diarize'); update the sentence in
SKILL.md to state that chunking_strategy must be 'auto' (required behavior)
rather than a caller-tunable default and keep the note that
modelOptions.response_format defaults to 'diarized_json' and callers must not
pass prompt, include, or timestamp_granularities when using
openaiTranscription('gpt-4o-transcribe-diarize').

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/ts-react-chat/src/lib/server-fns.ts`:
- Around line 84-86: The Zod enum used for transcription response formats
(TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA) is missing 'diarized_json', causing
callers to be rejected before reaching generateTranscription; update the enum
used by both entrypoints to include 'diarized_json' (i.e., add the
'diarized_json' string value to TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA and the
corresponding response-format validator used by the other entrypoint) so
diarized transcription requests validate successfully.

In `@packages/ai/src/types.ts`:
- Around line 1712-1717: The TranscriptionResponseFormat union is missing the
new 'diarized_json' member, causing type errors when callers set
TranscriptionOptions.responseFormat to that value; update the
TranscriptionResponseFormat type to include 'diarized_json' so the shared
contract matches the OpenAI provider, and ensure any other identical union (the
duplicate around the TranscriptionOptions declaration) is updated as well so
both TranscriptionResponseFormat and any repeated type declarations accept
'diarized_json'.

In `@testing/e2e/src/components/TranscriptionUI.tsx`:
- Around line 35-49: The default diarization E2E payload currently hardcodes
chunking_strategy ('chunking_strategy: "auto"') inside transcriptionInput ->
modelOptions when isDiarization is true; remove the chunking_strategy field from
transcriptionInput so the test covers the omitted-field/defaulting branch (leave
known_speaker_names and known_speaker_references as-is), and if you still want
explicit-option coverage add a separate test that constructs a
transcriptionInput with modelOptions.chunking_strategy = 'auto' to exercise the
passthrough path; update references to isDiarization, transcriptionInput, and
modelOptions accordingly.

In `@testing/e2e/src/lib/media-providers.ts`:
- Around line 99-108: The factory currently selects openaiTranscriptionModel
based on the optional feature param (in the openaiTranscriptionModel variable)
which can mismatch the actual transcription options; change
createOpenaiTranscription usage in the factories to derive the model from the
provided transcription options (e.g., inspect responseFormat and modelOptions
for diarization flags such as responseFormat === 'diarized_json' or
modelOptions.diarize) instead of relying on feature, or validate and reject when
diarization-specific options are present while feature !==
'transcription-diarization'; update the logic around openaiTranscriptionModel
and createOpenaiTranscription so diarization requests choose
'gpt-4o-transcribe-diarize' or fail fast.

---

Nitpick comments:
In `@packages/ai/skills/ai-core/media-generation/SKILL.md`:
- Around line 284-286: The documentation wording implies chunking_strategy:
'auto' is an optional default, but the adapter enforces that value for
openaiTranscription('gpt-4o-transcribe-diarize'); update the sentence in
SKILL.md to state that chunking_strategy must be 'auto' (required behavior)
rather than a caller-tunable default and keep the note that
modelOptions.response_format defaults to 'diarized_json' and callers must not
pass prompt, include, or timestamp_granularities when using
openaiTranscription('gpt-4o-transcribe-diarize').

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3121ef25-add2-4697-a74d-2dbfb49daa47

📥 Commits

Reviewing files that changed from the base of the PR and between fbb57a0 and c7cf3fc.

📒 Files selected for processing (30)

docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
examples/ts-react-chat/src/lib/audio-providers.ts
examples/ts-react-chat/src/lib/server-audio-adapters.ts
examples/ts-react-chat/src/lib/server-fns.ts
examples/ts-react-chat/src/routes/api.transcribe.ts
examples/ts-react-chat/src/routes/generations.transcription.tsx
knip.json
packages/ai-client/src/generation-types.ts
packages/ai-openai/src/adapters/transcription.ts
packages/ai-openai/src/audio/transcribe-provider-options.ts
packages/ai-openai/src/audio/transcription-provider-options.ts
packages/ai-openai/tests/transcription-adapter.test.ts
packages/ai/skills/ai-core/media-generation/SKILL.md
packages/ai/src/activities/generateTranscription/index.ts
packages/ai/src/types.ts
testing/e2e/fixtures/transcription/basic.json
testing/e2e/fixtures/transcription/diarization.json
testing/e2e/src/components/TranscriptionUI.tsx
testing/e2e/src/lib/feature-support.ts
testing/e2e/src/lib/features.ts
testing/e2e/src/lib/media-providers.ts
testing/e2e/src/lib/server-functions.ts
testing/e2e/src/lib/types.ts
testing/e2e/src/routes/$provider/$feature.tsx
testing/e2e/src/routes/api.transcription.stream.ts
testing/e2e/src/routes/api.transcription.ts
testing/e2e/tests/transcription.spec.ts

💤 Files with no reviewable changes (2)

knip.json
packages/ai-openai/src/audio/transcribe-provider-options.ts

✅ Files skipped from review due to trivial changes (5)

testing/e2e/src/lib/types.ts
docs/comparison/vercel-ai-sdk.md
testing/e2e/fixtures/transcription/diarization.json
docs/media/generation-hooks.md
docs/media/transcription.md

🚧 Files skipped from review as they are similar to previous changes (5)

packages/ai/src/activities/generateTranscription/index.ts
packages/ai-client/src/generation-types.ts
docs/adapters/openai.md
packages/ai-openai/tests/transcription-adapter.test.ts
packages/ai-openai/src/adapters/transcription.ts

coderabbitai · 2026-06-12T16:49:03Z

+  const isDiarization = feature === 'transcription-diarization'
+  const transcriptionInput: TranscriptionGenerateInput = {
+    audio: TEST_AUDIO_BASE64,
+    language: 'en',
+    ...(isDiarization
+      ? {
+          modelOptions: {
+            response_format: 'diarized_json',
+            chunking_strategy: 'auto',
+            known_speaker_names: ['agent', 'customer'],
+            known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],
+          },
+        }
+      : {}),
+  }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Leave chunking_strategy out of the default diarization E2E payload.

Line 43 hardcodes chunking_strategy: 'auto', so the new Playwright flow only proves the explicit-option path. The omitted-field/defaulting branch can regress without any of the three diarization E2E modes failing. I’d make the default test payload minimal and add a separate explicit-option case only if you still want passthrough coverage.

Suggested change

const transcriptionInput: TranscriptionGenerateInput = { audio: TEST_AUDIO_BASE64, language: 'en', ...(isDiarization ? { modelOptions: { response_format: 'diarized_json', - chunking_strategy: 'auto', known_speaker_names: ['agent', 'customer'], known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64], }, } : {}), }

As per coding guidelines, testing/e2e/**/*.spec.ts: every feature, bug fix, or behavior change must include E2E coverage using Playwright + aimock.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const isDiarization = feature === 'transcription-diarization'

const transcriptionInput: TranscriptionGenerateInput = {

audio: TEST_AUDIO_BASE64,

language: 'en',

...(isDiarization

? {

modelOptions: {

response_format: 'diarized_json',

chunking_strategy: 'auto',

known_speaker_names: ['agent', 'customer'],

known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],

},

}

: {}),

}

const isDiarization = feature === 'transcription-diarization'

const transcriptionInput: TranscriptionGenerateInput = {

audio: TEST_AUDIO_BASE64,

language: 'en',

...(isDiarization

? {

modelOptions: {

response_format: 'diarized_json',

known_speaker_names: ['agent', 'customer'],

known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],

},

}

: {}),

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/e2e/src/components/TranscriptionUI.tsx` around lines 35 - 49, The default diarization E2E payload currently hardcodes chunking_strategy ('chunking_strategy: "auto"') inside transcriptionInput -> modelOptions when isDiarization is true; remove the chunking_strategy field from transcriptionInput so the test covers the omitted-field/defaulting branch (leave known_speaker_names and known_speaker_references as-is), and if you still want explicit-option coverage add a separate test that constructs a transcriptionInput with modelOptions.chunking_strategy = 'auto' to exercise the passthrough path; update references to isDiarization, transcriptionInput, and modelOptions accordingly.

Source: Coding guidelines

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@testing/e2e/src/lib/media-providers.ts`:
- Around line 42-50: The code currently lets an internal flag
modelOptions.diarize flow into the OpenAI SDK; update the transcription request
construction to strip the diarize property before spreading modelOptions into
the SDK call—e.g., in the OpenAI transcription adapter where the request is
built, clone modelOptions and delete or omit the diarize key (while still using
getOpenaiTranscriptionModel(...) for detection), then spread the sanitized
object (e.g., sanitizedModelOptions) into request: { model, file,
...sanitizedModelOptions } so diarize is never sent upstream.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61c5ddd3-e650-49bd-b749-865a32276717

📥 Commits

Reviewing files that changed from the base of the PR and between c7cf3fc and 58aa20c.

📒 Files selected for processing (8)

examples/ts-react-chat/src/lib/server-fns.ts
examples/ts-react-chat/src/routes/api.transcribe.ts
packages/ai-openai/src/adapters/transcription.ts
packages/ai/src/types.ts
testing/e2e/src/lib/media-providers.ts
testing/e2e/src/lib/server-functions.ts
testing/e2e/src/routes/api.transcription.stream.ts
testing/e2e/src/routes/api.transcription.ts

🚧 Files skipped from review as they are similar to previous changes (4)

examples/ts-react-chat/src/routes/api.transcribe.ts
testing/e2e/src/lib/server-functions.ts
packages/ai-openai/src/adapters/transcription.ts
examples/ts-react-chat/src/lib/server-fns.ts

coderabbitai · 2026-06-13T11:15:28Z

+function getOpenaiTranscriptionModel(options: TranscriptionAdapterOptions) {
+  const modelOptions = options.modelOptions
+  const isDiarizationRequest =
+    options.responseFormat === 'diarized_json' ||
+    modelOptions?.response_format === 'diarized_json' ||
+    modelOptions?.diarize === true ||
+    modelOptions?.chunking_strategy !== undefined ||
+    modelOptions?.known_speaker_names !== undefined ||
+    modelOptions?.known_speaker_references !== undefined


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "1) Find where diarize is populated in E2E request construction" rg -n --type=ts -C2 '\bdiarize\b|response_format|chunking_strategy|known_speaker_' testing/e2e/src echo echo "2) Confirm modelOptions is forwarded from API routes" rg -n --type=ts -C3 'generateTranscription\(|modelOptions,' \ testing/e2e/src/routes/api.transcription.ts \ testing/e2e/src/routes/api.transcription.stream.ts echo echo "3) Confirm OpenAI adapter request spread behavior" rg -n --type=ts -C5 '\.\.\.modelOptions|TranscriptionCreateParamsNonStreaming|request\.response_format' \ packages/ai-openai/src/adapters/transcription.ts

Repository: TanStack/ai

Length of output: 5925

Prevent internal modelOptions.diarize from reaching the OpenAI SDK

Current E2E payloads don’t set modelOptions.diarize (they use response_format: 'diarized_json', chunking_strategy, and known_speaker_*), and modelOptions is forwarded unchanged by both transcription routes.

The OpenAI adapter still spreads ...modelOptions into the SDK request (request: { model, file, ...modelOptions }), so if any caller ever adds modelOptions.diarize, it would be sent upstream as an unsupported parameter—omit diarize before building the request.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/e2e/src/lib/media-providers.ts` around lines 42 - 50, The code currently lets an internal flag modelOptions.diarize flow into the OpenAI SDK; update the transcription request construction to strip the diarize property before spreading modelOptions into the SDK call—e.g., in the OpenAI transcription adapter where the request is built, clone modelOptions and delete or omit the diarize key (while still using getOpenaiTranscriptionModel(...) for detection), then spread the sanitized object (e.g., sanitizedModelOptions) into request: { model, file, ...sanitizedModelOptions } so diarize is never sent upstream.

tombeckenham

Thanks for the thorough follow-up here — the E2E coverage across all three transports, the openai-diarize option on the example page with speaker-labeled segments, the shared TranscriptionResponseFormat extraction, and the much more complete validateDiarizationOptions all address my earlier review. The response-mode discriminant ('diarized' | 'verbose' | 'plain') + request-plan refactor is a genuine improvement. 🙏

I'm requesting changes on one architectural point, plus the small cleanups it implies.

Requested change — keep `diarized_json` out of the shared union

diarized_json was added to the shared, cross-provider TranscriptionResponseFormat (packages/ai/src/types.ts:1712-1718). The problem: only the OpenAI adapter reads the top-level responseFormat. The other transcription adapters drive diarization from a boolean instead and ignore responseFormat entirely:

packages/ai-elevenlabs/src/adapters/transcription.ts → modelOptions.diarize: boolean
packages/ai-grok/src/adapters/transcription.ts → modelOptions.diarize: boolean
packages/ai-fal/... → no diarization support

So after this change, responseFormat: 'diarized_json' type-checks for every provider but is silently ignored by three of them. diarized_json isn't a portable format — it's OpenAI's wire-format literal for one model (gpt-4o-transcribe-diarize). The shared union should only advertise formats a generic caller can actually request.

Please keep diarized_json in OpenAITranscriptionResponseFormat only and revert the shared TranscriptionResponseFormat to the portable set ('json' | 'text' | 'srt' | 'verbose_json' | 'vtt'). Diarization on OpenAI is already driven through modelOptions.response_format: 'diarized_json' (that's what the example and E2E use), so nothing in the feature path is lost.

Two cleanups fall out of this and should land together:

OpenAITranscriptionResponseFormat = TranscriptionResponseFormat | 'diarized_json' (packages/ai-openai/src/audio/transcription-provider-options.ts:4-6) is currently redundant — diarized_json is already in the shared union, so it collapses to exactly TranscriptionResponseFormat. Once diarized_json moves out, this alias becomes a real extension again. ✅
const topLevelResponseFormat = responseFormat as OpenAITranscriptionResponseFormat | undefined (packages/ai-openai/src/adapters/transcription.ts:253-255) is a no-op cast today and becomes a safe widening after the change — drop the as and assign directly.

Direction we want — a portable `diarize` on/off

For where this should go cross-provider: the portable concept is diarization on/off, not a format string. And the output is already normalized — TranscriptionSegment.speaker?: string exists in the shared type and ElevenLabs already populates it. So the clean cross-provider surface is a top-level diarize?: boolean on TranscriptionOptions that each adapter maps to its own mechanism (OpenAI → diarize model + diarized_json; ElevenLabs/Grok → diarize: true), with results unified via segment.speaker.

I don't want to balloon this PR's scope. Either is fine with me:

add a top-level diarize?: boolean wired for OpenAI only in this PR (ElevenLabs/Grok wiring as a follow-up), or
keep this PR OpenAI-scoped via modelOptions and I'll open a follow-up issue for the portable flag.

Let me know which you'd prefer. (Note for whoever does the portable work: Grok currently types its speaker as number while the shared segment.speaker is string — that needs reconciling.)

Minor

Validation gap (packages/ai-openai/src/adapters/transcription.ts:458-467): the matching-length check only fires when both known_speaker_names and known_speaker_references are present. A lone array (one provided without the other) passes local validation and defers to a late opaque 400 — the exact thing this validator exists to prevent. Please reject early when exactly one of the two is provided.
Dead branch (testing/e2e/src/lib/media-providers.ts:47): modelOptions?.diarize === true is never set by any caller (the real switch is response_format: 'diarized_json'), and diarize isn't a valid OpenAI param. Please drop that clause so model selection is driven only by real fields.

Blocking: the shared-union scoping + the two cleanups it implies. The rest is quick polish. Thanks again!

🤖 Generated with Claude Code

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread packages/ai-openai/src/adapters/transcription.ts

AlemTuzlak requested a review from tombeckenham June 3, 2026 14:45

8times4 added 2 commits June 4, 2026 13:39

add diarization support

c82735e

fix coderabbit recommendations

fbb57a0

tombeckenham force-pushed the feat/openai-transcription-diarization branch from 05dfb53 to fbb57a0 Compare June 4, 2026 03:47

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

tombeckenham self-assigned this Jun 5, 2026

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

fix coderabbit findings

58aa20c

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

tombeckenham requested changes Jun 16, 2026

View reviewed changes

Uh oh!

Conversation

8times4 commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Changes

✅ Checklist

🚀 Release Impact

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

nx-cloud Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tombeckenham commented Jun 4, 2026

Uh oh!

tombeckenham commented Jun 4, 2026

Code review

Uh oh!

8times4 commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

tombeckenham left a comment

Choose a reason for hiding this comment

Requested change — keep diarized_json out of the shared union

Direction we want — a portable diarize on/off

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

8times4 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading

nx-cloud Bot commented Jun 4, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 4, 2026 •

edited

Loading

Requested change — keep `diarized_json` out of the shared union

Direction we want — a portable `diarize` on/off